The concept of the variable allows us to quantify various aspects of our observations.
nominal/categorical: These variables have a limited number of levels which cannot be ordered in a meaningful way. For instance, it does not matter which value of SUBORDTYPE or MORETHAN2CL comes first or last:
unique(cl.order$SUBORDTYPE)
[1] "temp" "caus"
unique(cl.order$MORETHAN2CL)
[1] "no" "yes"
ordinal: Such variables can be ordered, but the intervals between their individuals values are not meaningful. Heumann (2022: 6) provides a pertinent example:
“[T]he satisfaction with a product (unsatisfied–satisfied–very satisfied) is an ordinal variable because the values this variable can take can be ordered but the differences between ‘unsatisfied–satisfied’ and ‘satisfied–very satisfied’ cannot be compared in a numerical way”.
In the case of interval-scaled variables, the differences between the values can be interpreted, but their ratios must be treated with caution. A temperature of 4°C is 6 degrees warmer than -2°C; however, this does not imply that 4°C is three times warmer than -2°C. This is because the temperature scale has no true zero point; 0°C simply signifies another point on the scale and not the absence of temperature altogether.
Ratio-scaled variables allow both a meaningful interpretation of the differences between their values and (!) of the ratios between them. Within the context of clause length, LENGTH_DIFF values such as 4 and 8 not only suggest that the latter is four units greater than the former but also that their ratio \(\frac{8}{4} = 2\) is a valid way to describe the relationship between these values. Here a LENGTH_DIFF of 0 can be clearly viewed as the absence of a length difference.
2.3 Introduction to ggplot2
2.3.1 Building a ggplot
A ggplot requires at minimum three elements: (1) a data frame, (2) axis labels, and (3) a plotting option (also known as “geom”). We combine them with the + sign.
# Supply data frameggplot(data = cl.order,# Supply axis labelsmapping =aes(x = LEN_MC, y = LEN_SC)) +# Set plotting option (here: scatterplot)geom_point()
2.3.2 Adding layers
Visualise a third variable using the colors argument as part of the aes() function.
ggplot(data = cl.order,mapping =aes(x = LEN_MC, y = LEN_SC)) +1geom_point(aes(color = ORDER, shape = SUBORDTYPE)) +labs(2title ="Length of main and subordinate clauses",subtitle ="Dimensions for different ordering types",x ="Length of main clause",y ="Length of subordinate clause",color ="ORDER",shape ="SUBORDTYPE" ) +3theme_classic()
1
Map variables to axes, colours and shapes.
2
Add a legend with a title, subtitle and axis labels.